\documentclass{refrep}
\settextfraction{.9}
\usepackage{extramarks}
\usepackage{amsmath}
\usepackage{amsthm}
\usepackage{amsfonts}
\usepackage{tikz}
\usepackage[plain]{algorithm}
\usepackage{algpseudocode}
\usepackage{listings}
\usepackage{datetime}
\usepackage{xcolor}
\usepackage{float}
\usepackage{graphicx}
\usepackage[export]{adjustbox}
\usepackage{caption}
\usepackage{contour}
\makeatletter
\newcount\dirtree@lvl
\newcount\dirtree@plvl
\newcount\dirtree@clvl
\def\dirtree@growth{
  \ifnum\tikznumberofcurrentchild=1\relax
  \global\advance\dirtree@plvl by 1
  \expandafter\xdef\csname dirtree@p@\the\dirtree@plvl\endcsname{\the\dirtree@lvl}
  \fi
  \global\advance\dirtree@lvl by 1\relax
  \dirtree@clvl=\dirtree@lvl
  \advance\dirtree@clvl by -\csname dirtree@p@\the\dirtree@plvl\endcsname
  \pgf@xa=.25cm\relax
  \pgf@ya=-.5cm\relax
  \pgf@ya=\dirtree@clvl\pgf@ya
  \pgftransformshift{\pgfqpoint{\the\pgf@xa}{\the\pgf@ya}}%
  \ifnum\tikznumberofcurrentchild=\tikznumberofchildren
  \global\advance\dirtree@plvl by -1
  \fi
}

\tikzset{
  dirtree/.style={
    growth function=\dirtree@growth,
    every node/.style={anchor=north},
    every child node/.style={anchor=west},
    edge from parent path={(\tikzparentnode\tikzparentanchor) |- (\tikzchildnode\tikzchildanchor)}
  }
}

\makeatother
\graphicspath{ {images/vc707/} {images/vc709/} {images/qsys/} {images/ipc/} {images/perf/} {images/}}
\newcommand{\figurewidth}{350px}
\newcommand{\VivadoVer}{2014.4}
\newcommand{\QuartusVer}{14.1}
\newcommand{\RIFFAVer}{2.2.2}
\newcommand{\Directory}[1]{\textit{#1}}
\newcommand{\EnvVariable}[1]{\textbf{#1}}
\newcommand{\TermCmd}[1]{\$ \texttt{#1}}
\newcommand{\TODO}[1]{\textbf{\color{red}{#1}}}
\newcommand{\Xilinx}[1]{{\color{red}{#1}}}
\newcommand{\Altera}[1]{{\color{blue}{#1}}}
\newcommand{\ConfigSetting}[1]{\textbf{#1}}
\newcommand{\VCSevenOhNineExampleDesignsLong}{VC709\_Gen1x8If64 (PCIe Gen1, 8 lanes, 64-bit CHNL interface), VC709\_Gen2x8If128 (PCIe Gen2, 8 lanes, 128-bit CHNL interface), VC709\_Gen3x4If128 (PCIe Gen3, 8 lanes, 128-bit CHNL interface)}
\newcommand{\VCSevenOhNineExampleDesignsShort}{VC709\_Gen1x8If64, VC709\_Gen2x8If128, VC709\_Gen3x4If128}
\newcommand{\RIFFAParameter}[1]{\textit{\textbf{#1}}}

\title{RIFFA \RIFFAVer~ Documentation}
\author{Dustin Richmond, Matt Jacobsen}
\date{\today}


\begin{document}

\maketitle
\pagebreak
\tableofcontents

\chapter{Introduction: RIFFA}
\section{What is RIFFA}

RIFFA (Reusable Integration Framework for FPGA Accelerators) is a simple
framework for communicating data from a host CPU to a FPGA via a PCI Express
bus. The framework requires a PCIe enabled workstation and a FPGA on a board
with a PCIe connector. RIFFA supports Windows and Linux, Altera and Xilinx, with
bindings in C/C++, Python, MATLAB and Java.

On the software side there are two main functions: data send and data
receive. These functions are exposed via user libraries in C/C++, Python,
MATLAB, and Java. The driver supports multiple FPGAs (up to 5) per system. The
software bindings work on Linux and Windows operating systems. Users can
communicate with FPGA IP cores by writing only a few lines of code.

On the hardware side, users access an interface with independent transmit and
receive signals. The signals provide transaction handshaking and a first word
fall through FIFO interface for reading/writing data to the host. No knowledge
of bus addresses, buffer sizes, or PCIe packet formats is required. Simply send
data on a FIFO interface and receive data on a FIFO interface. RIFFA does not
rely on a PCIe Bridge and therefore is not subject to the limitations of a
bridge implementation. Instead, RIFFA works directly with the PCIe Endpoint and
can run fast enough to saturate the PCIe link.

RIFFA communicates data using direct memory access (DMA) transfers and interrupt
signaling. This achieves high bandwidth over the PCIe link. In our tests we are
able to saturate (or near saturate) the link in all our tests. The RIFFA distribution
contains examples and guides for setting up designs on several standard development
boards. 

\begin{figure}[H]
  \includegraphics[width=300px,center]{bw_sg_2_1_maxs.png}
  \caption{Graph of Bandwidth vs Transfer Size}
  \label{Fig:RIFFA:Performance}
\end{figure}

RIFFA \RIFFAVer~is significantly more efficient than its predecesor RIFFA
1.0. RIFFA \RIFFAVer~is able to saturate the PCIe link for nearly all link
configurations supported. Figure~\ref{Fig:RIFFA:Performance} shows the
performance of designs using the 32 bit, 64 bit, and 128 bit interfaces. The
colored bands show the bandwidth region between the theoretical maximum and the
maximum achievable. PCIe Gen 1 and 2 use 8 bit / 10 bit encoding which limits
the maximum achievable bandwidth to 80\% of the theoretical. Our experiments
show that RIFFA can achieve 80\% of the theoretical bandwidth in nearly all
cases. The 128 bit interface achieves 76\% of the theoretical maximum.

If you are using RIFFA on a new platform not listed above let us know and we’ll
help you out!

%\pagebreak
\section{Licensing}

Copyright (c) 2016, The Regents of the University of California All rights
reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

\begin{itemize}
    \item Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.

    \item Redistributions in binary form must reproduce the above
      copyright notice, this list of conditions and the following
      disclaimer in the documentation and/or other materials provided
      with the distribution.

    \item Neither the name of The Regents of the University of California
      nor the names of its contributors may be used to endorse or
      promote products derived from this software without specific
      prior written permission.
\end{itemize}

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS''
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL REGENTS OF THE UNIVERSITY OF CALIFORNIA BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


\pagebreak
\chapter{Getting Started}
\section{Development Board Support in RIFFA \RIFFAVer}
RIFFA \RIFFAVer~supports:
\begin{itemize}
\item The VC707, ZC706 and similar boards with the Xilinx IP Core 7-Series
  Integrated Block for PCI Express. Example designs for the VC707 and ZC706
  boards are provided, and contain this core. The current distribution supports
  all 64-bit interfaces for these devices, with 128-bit support coming soon
  after the initial release. (Support for the 128-bit interface is in RIFFA 2.1,
  but is temporarily mising due to changes)
\item The VC709 board and similar boards with the Xilinx IP Gen3 Integrated
  Block for PCI Express. Example designs for the VC709 are provided, and contain
  this core. The current distribution supports all 64-bit and 128-bit AXI
  interfaces. 256-bit (PCIe Gen3 x8) support is planned for a later date.
\item The DE5-Net board and similar boards with the Stratix V, Cyclone V, and
  Arria V, Hard IP for PCI express (Avalon Streaming Interface). Example designs
  for the DE5-net board are provided, and contain the Stratix V version of this
  core. The current distribution supports all 64-bit and 128-bit Avalon
  Streaming interfaces.
\item The DE4 and similar boards with the IP Compiler for PCI Express Core,
  supporting Stratix IV, Cyclone IV and Arria II devices. Example designs for
  the DE4 board are provided. The current distribution supports all 64-bit and
  128-bit Avalon Streaming interfaces.
\end{itemize}
\section{Understanding this User Guide}
In this user guide, we use the following conventions:
\begin{center}
  \begin{tabular}{ | l | l |}
    \hline
    Object & Example \\ \hline
    Directories and Paths & \Directory{RIFFA \RIFFAVer/source/fpga/riffa}  \\ \hline
    Xilinx Specific Content & \Xilinx{vc709} \\ \hline
    Altera Specific Content & \Altera{de5} \\ \hline
    Configuration Setting & \ConfigSetting{Number of Lanes} \\ \hline
    Terminal Command, Code Snippet & \TermCmd{echo ``Hello World''}\\ \hline
    RIFFA Parameter & \RIFFAParameter{C\_NUM\_CHNL}\\\hline
  \end{tabular}
\end{center}

\section{Decoding What's Provided}
\label{Sec:Intro:Decoding}
Fig~\ref{Fig:RIFFA:DirStructure} shows the directory hierarchy of RIFFA. This
instruction manual uses this directory tree when specifying all directory paths.

The \Directory{RIFFA \RIFFAVer/source/fpga/} contains a directory for each board
we have tested for the current distribution:\Altera{de5, de4}, \Xilinx{VC709,
  VC707, ZC706}. Each board directory has several example project directories
(e.g. \Altera{DE5Gen1x8If64} and \Xilinx{VC709\_Gen1x8If64}). Each example
project directory has 5 sub-directories:
\begin{itemize}
\item \Directory{prj/} contains all of the project files (\Altera{.qsf,.qpf}, \Xilinx{.xpr}).
\item \Directory{ip/} contains all of the ip files (\Altera{.qsys}, \Xilinx{.xci}) generated for the project, when permitted by licensing agreements.
\item \Directory{bit/} contains the example programming file for the corresponding FPGA example design. \Altera{Quartus} and \Xilinx{Vivado} do not modify this programming (\Altera{.sof}, \Xilinx{.bit}).
\item \Directory{constr/} contains the user constraint files (\Altera{.sdc}, \Xilinx{.xdc}).
\item \Directory{hdl/} contains any example-project specific Verilog files, such as the project top level file.
\end{itemize}

\begin{figure}[H]
  \begin{tikzpicture}[dirtree]
    \node {RIFFA \RIFFAVer}
    child { node {Documentation}
      child { node {RIFFA User Guide.pdf (This Document)} }
    }
    child { node {install}
      child { node {windows}
        child{ node {setup.exe}}
        child{ node {setup\_debug.exe}}
      }
      child { node {linux}
        child{ node {README.txt}}
      }
    }
    child { node {source}
      child { node {c\_c++} }
      child { node {java} }
      child { node {matlab} }
      child { node {python} }
      child { node {driver}
        child { node {linux}
          child{ node {makefile}}
          child{ node {... (Other source files)}}
        }
        child { node {windows}
          child{ node {... (Other source files \& directories)}}
        }
      }
      child { node{fpga}
        child { node{riffa}
          child {node{(RIFFA Source)}}
        }
        child { node{\Altera{de5}}
          child {node{\Altera{DE5Gen1x8If64}}
            child{node{prj}}
            child{node{bit}}
            child{node{ip}}
            child{node{constr}}
            child{node{hdl}}
          }
          child {node{... (Other example projects)}}
        }
        child { node{\Altera{de4}}
          child {node{\Altera{DE4Gen1x8If64}}}
          child {node{...}}
        }
        child { node{\Xilinx{vc709}}
          child {node{\Xilinx{VC709\_Gen1x8If64}}}
          child {node{...}}
        }
        child { node{\Xilinx{zc706}}
          child {node{\Xilinx{ZC706\_Gen1x8If64}}}
          child {node{...}}
        }
        child { node{\Xilinx{vc707}}
          child {node{\Xilinx{VC707\_Gen1x8If64}}}
          child {node{...}}
        }
      }
    };
  \end{tikzpicture}
  \caption{Directory hierarchy of the RIFFA \RIFFAVer~distribution} \label{Fig:RIFFA:DirStructure}
\end{figure}

\section{Release Notes}
\subsection{Version 2.2.2}
\begin{itemize}
\item Fixed: Unsigned Windows Driver (Commit: 4e989fc)

\item Fixed: A bug in the clock-crossing interface where a low-frequency user
  clock (<40 MHz) could cause incorrect channel behavior. (Commit: 778c42e)

\item Fixed: Support for 64-bit pointers in Python 3. Shout out to 
  @jrobrien (Commit: af7c592)

\item Fixed: Includes in linux driver. linux/slab.h was not included in 
  riffa\_driver.c (Commit: cd494e1)

\item Fixed: Capability backwards/forwards compatibility issues to support Linux
  for Tegra. (Commit: 1d228c1)

\item New: Support for new get\_user\_pages API in the linux kernel. Shout out to
  @marzoul (Commit: TBD)

\item Removed: Non-Qsys DE5 Board Example Designs (QSys > IP Generator)
  @marzoul (Commit: TBD)

\end{itemize}
\subsection{Version 2.2.1}
\begin{itemize}
\item New: Reset logic for the Engine layer to handle RIFFA induced resets

\item New: Stability/multi-thread-concurrency warnings in the Linux driver 
  (Shoutout to @marzoul)

\item Fixed: A bug in the Linux Driver that prevented compilation on older kernels

\item Fixed: Windows driver issue for back-to-back small transfers (See 2.2.0)

\item Fixed: WORD\_ENABLE bug fix for the Classic Xilinx (VC707, ZC706, AC701, 
  KC705) 128-bit interface

\item Fixed: TX Engine Buffer sizing (high-bandwidth transfers occasionally had 
  corruption)

\item Fixed: RX Engine rx\_st\_valid bug fix for Altera IP Compiler for PCI express 
  (Cyclone IV, Stratix IV)
\end{itemize}

\subsection{Version 2.2.0}
\begin{itemize}
\item Added: Support for the new Gen3 Integrated Block for PCIe Express, and the VC709 Development board.
\item Added: ZC706 Example Designs
\item Changed: Xilinx example project packaging. All Xilinx Virtex 7 projects are now click-to-compile, and come with instantiated IP.
\item Re-wrote and refactored: Various parts of the TX and RX engines to maximize code reuse between different vendors and PCIe endpoint implementations
\item Fixed: A bug in the Linux Driver that prevented compilation on older kernels
\item Fixed: A bug in the Windows Driver that prevented repeated small transfers.
\end{itemize}
\subsection{Version 2.1.0}
\begin{itemize}

\item Added reorder\_queue and updated many rx/tx engine and channel modules that 
  use it.
\item Added parameters for number of tags to use and max payload length for sizing 
  RAM for reorder\_queue.
\item Fixed: Bug in the riffa\_driver.c, too few circular buffer elements.
\item Fixed: Bug in the riffa\_driver.c, bad order in which interrupt vector bits 
  were processed. Can cause deadlock in heavy use situations.
\item Fixed: Bug in the tx\_port\_writer.v, maxlen did not start with a value of 1.
  Can cause deadlock behavior on second transfer.
\item Fixed: Bug in the rx\_port\_reader.v, added delay to allow FIFO flush to 
  propagate.
\item Fixed: Bug in rx\_port\_xxx.v, changed to use FWFT FIFO instead of existing 
  logic that could cause CHNL\_RX\_DATA\_VALID to drop for a cycle after CHNL\_RX 
  dropped even when there is still data in the FIFO. Can cause premature 
  transmission termination.
\item Changed rx\_port\_channel\_gate.v to use FWFT FIFO.
\item Removed unused signal from rx\_port\_requester\_mux.v.
\item Fixed: Typo/bug that would attempt to change state within 
  tx\_port\_monitor\_xxx.v.
\item Added flow control for receive credits to avoid over driving upstream transactions
  (applies to Altera devices).
\end{itemize}

\subsection{Version 2.0.2}
\begin{itemize}

\item Fixed: Bug in Windows and Linux drivers that could report data sent/received 
  before such data was confirmed.
\item Fixed: Updated common functions to avoid assigning input values.
\item Fixed: FIFO overflow error causing data corruption in tx\_engine\_upper and
  breaking the Xilinx Endpoint.
\item Fixed: Missing default cases in rx\_port\_reader, \\
  sg\_list\_requester, tx\_engine\_upper, and tx\_port\_writer.
\item Fixed: Bug in tx\_engine\_lower\_128 corrupting s\_axis\_tx\_tkeep, causing Xilinx
  PCIe endpoint core to shut down.
\item Fixed: Bug in tx\_engine\_upper\_128 causing incomplete TX data timeouts.
\item Changed rx\_engine to not block on nonposted TLPs. They're added to a FIFO and
  serviced in order. 
\item Reset rx\_port FIFOs before a receive transaction to avoid data corruption from
  replayed TLPs.
\end{itemize}

\subsection{Version 2.0.1}

\begin{itemize}
\item RIFFA 2.0.1 is a general release. This means we've tested it in a number of
  ways. Please let us know if you encounter a bug. 
\item Neither the HDL nor the drivers from RIFFA 2.0.1 are backwards compatible with
  the components of any previous release of RIFFA.
\item RIFFA 2.0.1 consumes more resources than 2.0 beta. This is because 2.0.1 was
  rewritten to support scatter gather DMA, higher bandwidth, and appreciably 
  more signal registering. The additional registering was included to help meet 
  timing constraints. 
\item The Windows driver is supported on Windows 7 32/64. Other Windows versions 
  can be supported. The driver simply needs to be built for that target.
\item Debugging on Windows is difficult because there exists no system log file.
  Driver log messages are visible only to an attached kernel debugger. So to see
  any messages you'll need the Windows Development Kit debugger (WinDbg) or a 
  small utility called DbgView. DbgView is a standalone kernel debug viewer that
\item   http://technet.microsoft.com/ens/sysinternals/bb896647.aspx
  Run DbgView with administrator privileges and be sure to enable the following
  capture options: Capture Kernel, Capture Events, and Capture Verbose Kernel 
  Output.
\item The Linux driver is supported on kernel version 2.6.27+. 
\item The Java bindings make use of a native library (in order to connect Java JNI
  to the native library). Libraries for Linux and Windows for both 32/64 bit 
  platforms have been compiled and included in the riffa.jar.
\item Removed the CHNL\_RX\_ERR signal from the channel interface. Error handling now
  ends the transaction gracefully. Errors can be easily detected by comparing
  the number of words received to the CHNL\_RX\_LEN amount. An error will cause
  CHNL\_RX will go low prematurely and not provide the advertised amount of data.
\item Fixed: Bug in sg\_list\_requester which could cause an unbounded TLP request.
\item Fixed: Bug in tx\_port\_buffer\_128 which could stall the TX transaction.
\end{itemize}
\section{Errata}
While we have extensively tested the current distribution, we are human and
cannot eliminate all bugs in our distribution. As a general rule of thumb, if
you find yourself delving into the RIFFA code, you have gone too far. Contact us
if you need additional assistance!

See the following notes for issues we are currently tracking:

\subsection{Windows}
\subsection{Linux}
No open issues
\subsection{Altera}

\textbf{Issue 3: No support for the 256-bit, Gen3x8 Interface} Coming soon...
\subsection{Xilinx (Classic)}

\textbf{Issue 1: Missing example designs for ML605} There is no disadvantage to
using RIFFA 2.1.0 until we return support in a future distribution.

\textbf{Issue 2: Missing example design for Spartan 6 LXT Development board} The
32-bit interface support has been removed from RIFFA 2.2 and may be added back
in the future. Please use RIFFA 2.1 in the meantime

\subsection{Xilinx (Ultrascale)}
\textbf{Issue 1: No support for the 256-bit, Gen3x8 Interface} Coming soon...

\pagebreak

\chapter{Installing the RIFFA driver}
\label{Sec:RIFFA:Installation}
\section{Linux}
To install the RIFFA driver in linux, you must build it against your installed
version of the Linux kernel. RIFFA \RIFFAVer~comes with a makefile that will
install the necessar linux kernel headers and the driver. This makefile will
also build and install the C/C++ native library. To install RIFFA \RIFFAVer~in
linux, follow these instructions:

\begin{enumerate}
\item Open a terminal in linux and navigate to the \Directory{RIFFA \RIFFAVer/source/driver/linux} directory.
\item Ensure you have the kernel headers installed, run:

  \TermCmd{sudo make setup}

  This will attempt to install the kernel headers using your system's package
  manager. You can skip this step if you've already installed the kernel headers.
\item Compile the driver and C/C++ library:

  \TermCmd{make}

  or

  \TermCmd{make debug}

  Using make debug will compile in code to output debug messages to the system log
  at runtime. These messages are useful when developing your design. However they
  pollute your system log and incur some overhead. So you may want to install the
  non-debug version after you've completed development.
\item Install the driver and library:

  \TermCmd{sudo make install}

  The system will be configured to load the driver at boot time. The C/C++ library
  will be installed in the default library path. The header files will be placed
  in the default include path. You will need to reboot after you've installed for
  the driver to be (re)loaded. 
\item If the driver is installed and there is a RIFFA \RIFFAVer~configured FPGA when the computer boots, the
  driver will detect it. Output in the system log will provide additional
  information. 
\item The C/C++ code must include the riffa.h header. An example inclusion is
  shown in Listing~\ref{Listing:RIFFA:Include}
\item When compiling (using GCC/G++, etc.) you must link with the RIFFA
  libraries using the -lriffa flag. For example, when compiling test.c from
  Listing~\ref{Listing:RIFFA:Include}:

  \TermCmd{gcc -g -c -lriffa -o test.o test.c}
\item Bindings for other languages can be installed by following the README
  files in their respective directories (See Figure~\ref{Fig:RIFFA:DirStructure}
\end{enumerate}

\section{Windows}
Currently only Windows 7 (32/64) is supported by RIFFA \RIFFAVer. In the
\Directory{RIFFA \RIFFAVer/install/windows/} subdirectory use the
provided setup.exe program to install the RIFFA driver and native C/C++
library. You can verify that RIFFA \RIFFAVer~installed correctly by checking
the installation directory in Program Files. After installation, you'll be able
to install the bindings for other languages.

The setup\_dbg.exe installer installs a driver with additional debugging output.
You can install the setup\_dbg.exe version and then later use setup.exe to
install the non-debug output version.

\begin{lstlisting}[basicstyle=\footnotesize\ttfamily,language=C,
    commentstyle=\color{red},label=Listing:RIFFA:Include,
    caption=Inclusion of the RIFFA header files in a user application,frame=single]
  #include <stdio.h>
  #include <stdlib.h>
  #include <riffa.h>
  #define BUF_SIZE (1*1024*1024)

  unsigned int buf[BUF_SIZE];

  int main(int argc, char* argv[]) {
    fpga_t * fpga;
    int fid = 0; // FPGA id
    int channel = 0; // FPGA channel

    fpga = fpga_open(fid);
    fpga_send(fpga, channel, (void *)buf, BUF_SIZE, 0, 1, 0);
    fpga_recv(fpga, channel, (void *)buf, BUF_SIZE, 0);
    fpga_close(fpga);
    return 0;
  }
\end{lstlisting}


\chapter{Compiling and using the Xilinx Example Designs}
Vivado \VivadoVer~was used in all example designs and documentation included in
this distribution. We highly recommend using \VivadoVer~and all newer versions
of the software, since we have encountered bugs in previous versions of the
Vivado (e.g. 2014.2) software. This guide assumes that the end-user has already
configured their board for PCI Express operation. See the VC709 User Guide
\footnote{http://www.xilinx.com/support/documentation/boards\_and\_kits/vc709/ug887-vc709-eval-board-v7-fpga.pdf},
VC707 User Guide
\footnote{http://www.xilinx.com/support/documentation/boards\_and\_kits/vc707/ug848-VC707-getting-started-guide.pdf}
or ZC706 User Guide
\footnote{http://www.xilinx.com/support/documentation/boards\_and\_kits/zc706/ug954-zc706-eval-board-xc7z045-ap-soc.pdf}.

While we have not tested all of the current-generation Xilinx development
boards, we are confident that they can be supported with minimal
modifications. For more information about supporting new boards, see the
sections \ref{Sec:7SeriesIntegrated:Generating} and
\ref{Sec:Gen3Integrated:Generating}. These sections cover the settings used
in the RIFFA example design IP.

The easiset way to use RIFFA is to start with one of the example designs
included in the distribution. Sections \ref{Sec:7SeriesIntegrated:ExampleDesign}
and \ref{Sec:Gen3Integrated:ExampleDesign} describe how to use and compile these
designs for the VC707 and VC709 boards respectively. These example designs are
ready to compile out of the box, and require no user IP configuration and
generation. The designs also include pre-compiled bit-files in the
\Directory{bit} directory of the example project. For advanced users, we also
describe how we generated the PCIe IP in sections
\ref{Sec:7SeriesIntegrated:Generating} and
\ref{Sec:Gen3Integrated:Generating}.

\section{Classic - 7 Series Integrated Block for PCI Express - (VC707, ZC706 and older)}  
This is a step by step guide for using RIFFA \RIFFAVer~on a Xilinx FPGA with the
7 Series Integrated Block for PCI Express Core. This core is supported on the
ZC706, and VC707 development boards, using the 64-bit and 128-bit AXI
interfaces.

\subsection{VC707 and ZC706 Example Designs}
\label{Sec:7SeriesIntegrated:ExampleDesign}
There is one VC707 example design and two ZC706 example designs in the RIFFA
\RIFFAVer~distribution. The VC707 example design folders are in \Directory{RIFFA
  \RIFFAVer/source/fpga/vc707} and the ZC706 example design folders are in
\Directory{RIFFA \RIFFAVer/source/fpga/zc706}.

\begin{enumerate}
\item Open Vivado to get the introductory screen shown in
  Figure~\ref{Fig:Vivado:WelcomeScreen}.
\item Click 'Open an Existing Project' and navigate to your RIFFA
  \RIFFAVer~directory.
\item In the RIFFA \RIFFAVer~distribution, open \Directory{RIFFA
  \RIFFAVer/source/fpga/xilinx/vc707/} or \Directory{RIFFA
  \RIFFAVer/source/fpga/zc706} and choose from one of the existing example
  design directories for your board. In the example design directory, locate the
  \Directory{prj} folder and open it. Select the .xpr file and click open. This
  will open the example project, as shown in
  Figure~\ref{Fig:7SeriesIntegrated:ExampleDesign:ProjectOpened}.
\item This project was compiled in Vivado \VivadoVer. The bit file generated can
  be used to test the FPGA system. If you are using a newer version of Vivado,
  recompile the example design or use the programming file provided.
  \begin{itemize}
  \item IP Settings are now packaged as part of the example designs! Users no
    longer need to generate IP.
  \item To recompile the example design, click the generate bitstream button in
    the top left corner as shown in
    Figure~\ref{Fig:7SeriesIntegrated:ExampleDesign:ProjectOpened}.
  \item Recompiling your design will generate a new bitfile in the Xilinx
    project. The bit file in the \Directory{bit} will not be changed.
  \end{itemize}
\item To program the FPGA, click 'Open Hardware Manager'. New bit files
  (generated by Vivado) will appear in Vivado's internal directory. An example
  bit file is provided in the example design's \Directory{bit}. Load the
  bitstream to your VC707 or ZC706 board and restart your computer.
  \begin{itemize}
  \item Before programming your FPGA, you should install the RIFFA driver. See
    Section~\ref{Sec:RIFFA:Installation}
  \end{itemize}
\item The example design uses the chnl\_tester (shown in
  Figure~\ref{Fig:Vivado:ExampleDesign:chnl_tester}, which works with
  the example software in the \Directory{source/\{C\_C++,Java,python,matlab\}}
  directories. Replace the chnl\_tester instantiation with any user logic,
  matching the RIFFA interface.
\item Recompile the design and program the FPGA Device. Changing the
  \RIFFAParameter{C\_NUM\_CHNL} will change the number of independent channel
  interfaces
\end{enumerate}
\begin{figure}
  \includegraphics[width=400px,center]{VivadoWelcomeScreen.png}
  \caption{Welcome Screen for Vivado \VivadoVer}
  \label{Fig:Vivado:WelcomeScreen}
\end{figure}
\begin{figure}
  \includegraphics[width=400px,center]{7SeriesIntegratedOpenProject.png}
  \caption{Project Splash Screen for 7Series Integrated Block for PCI Express Projects}
  \label{Fig:7SeriesIntegrated:ExampleDesign:ProjectOpened}
\end{figure}
\begin{figure}
  \includegraphics[width=400px,trim=500 200 200 250, clip=true,center]{VivadoChnlTesterInstantiation.png}
  \caption{Project Splash Screen for 7Series Integrated Block for PCI Express Projects}
  \label{Fig:Vivado:ExampleDesign:chnl_tester}
\end{figure}

\subsection{Generating the 7 Series Integrated Block for PCI Express}
\label{Sec:7SeriesIntegrated:Generating}
The following steps are not required for general users. See the instructions
above for how to compile RIFFA.

Alternatively, it is possible to generate the PCIe Endpoint with different
settings than those provided in the example design. Modifying the RIFFA
parameters \RIFFAParameter{C\_PCI\_DATA\_WIDTH},
\RIFFAParameter{C\_MAX\_PAYLOAD\_BYTES} and \RIFFAParameter{C\_LOG\_NUM\_TAGS},
change certain settings in the IP Core. The \RIFFAParameter{C\_NUM\_LANES} is a
parameter in the top level file of each example project. How these parameters
relate to IP core settings is highlighted in the following figures.

If the goal is to generate a RIFFA design completely from scratch, each board
directory comes with a RIFFA wrapper verilog file and instantiates a
vendor-specific translation layer. It is highly recommended to re-use these
files RIFFA wrapper when creating designs from scratch.

To generate the PCIe IP select the 7 Series Integrated Block for PCI Express
after selecting the IP Catalog shown in
Figure~\ref{Fig:7SeriesIntegrated:ExampleDesign:ProjectOpened}. This will open
the IP Customization window as shown in 
Figure~\ref{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabBasic}
\begin{figure}[H]
  \includegraphics[width=\figurewidth,center]{7SeriesIntegratedTabBasic.png}
  \caption{Basic settings tab.}
  \label{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabBasic}
\end{figure}
First, select \ConfigSetting{Mode} to \ConfigSetting{ADVANCED} from the drop
down menu. This will cause more tabs to appear in the bar. The following tabs
are not used during customization: Link Registers, Power Mangement,
Ext. Capabilities, Ext. Capabilities 2, TL Settings and DL/PL Settings.

In Figure~\ref{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabBasic} , we
have set the \ConfigSetting{Xilinx Development Board} to \ConfigSetting{VC707},
selected the PCIe Gen1 rate \ConfigSetting{2.5 GT/s}, and a \ConfigSetting{Lane
  Width} of \ConfigSetting{8} (\RIFFAParameter{C\_NUM\_LANES} = 8). We have
chosen to set the \ConfigSetting{AXI Interface Width} to \ConfigSetting{64-bits}
(\RIFFAParameter{C\_PCI\_DATA\_WIDTH} = 64). The choice of Link Rate, Lanes, and
Interface Width will allow different AXI Interface Frequencies to be
selected. The RIFFA core will run at this clock frequency, but the user logic
can run at whatever frequency it desires.

Optional: Set the Component Name of the PCI Express block, and the IP
Location. In our example projects, we use the name template
PCIeGen\textbf{W}x\textbf{Y}If\textbf{Z} where \textbf{W} is the PCI Express
Version (\ConfigSetting{Link Speed} in
Figure~\ref{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabBasic}),
\textbf{Y} is the lane width, and \textbf{Z} is the AXI interface width.  The IP
location is the \Directory{ip} directory in the example project.

\begin{figure}[H]
  \includegraphics[width=350px,center]{7SeriesIntegratedTabIDs.png}
  \caption{PCI Express ID Tab.}
  \label{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabIDs}
\end{figure}
The tab in Figure~\ref{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabIDs}
is optional. Setting the Device ID may assist in identifying different FPGAs in
a multi-FPGA system. The other options, specifically the Vendor ID, must remain
the same.

\begin{figure}[H]
  \includegraphics[width=350px,center]{7SeriesIntegratedTabBARs.png}
  \caption{PCI Express Base Address Register (BAR) ID Tab.}
  \label{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabBARs}
\end{figure}
The tab in
Figure~\ref{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabBARs} must be
configured so that BAR0 is \ConfigSetting{enabled} (checked). Set the
\ConfigSetting{Type} to \ConfigSetting{Memory}, and \ConfigSetting{Unit}
\ConfigSetting{Kilobyte}, and \ConfigSetting{Size Value} to \ConfigSetting{1}
from the dropdown menus. If these values are not set correctly the RIFFA driver
will not recognize the FPGA device.

\begin{figure}[H]
  \includegraphics[width=350px,center]{7SeriesIntegratedTabCapabilities.png}
  \caption{PCI Express Capabilities Tab.}
  \label{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabCapabilities}
\end{figure}
In this tab select the boxes \ConfigSetting{Buffering Optimized for Bus
  Mastering Applications} and \ConfigSetting{Extended Tag Field}. If the
\ConfigSetting{Extended Tag Field} is selected
\RIFFAParameter{C\_LOG\_NUM\_TAGS} = 8, otherwise
\RIFFAParameter{C\_LOG\_NUM\_TAGS} = 5. Select the \ConfigSetting{Maximum
  Payload Size} from the dropdown menu. Use this to set the RIFFA
\RIFFAParameter{C\_MAX\_PAYLOAD\_BYTES} parameter.

Note: Maximum Payload sizes are typically set by the BIOS, and 256 bytes seems
to be standard. RIFFA will default to the minimum of
\RIFFAParameter{C\_MAX\_PAYLOAD\_SIZE} and the setting in your BIOS. Unless your
BIOS is modified, or can support substantially larger packets, there will be no
performance benefit to increasing the payload size. Increasing the maximum
payload size will increase the resources consumed.

% \begin{figure}[H]
%   \includegraphics[width=350px,center]{7SeriesIntegratedTabLinkRegisters.png}
%   \label{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabLinkRegisters}
% \end{figure}
\begin{figure}[H]
  \includegraphics[width=350px,center]{7SeriesIntegratedTabInterrupts.png}
  \caption{PCI Express Interrupts Tab.}
  \label{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabInterrupts}
\end{figure}
In the Interrupts Tab shown in
Figure~\ref{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabInterrupts}
\ConfigSetting{clear} the checkbox for \ConfigSetting{Enable INTx} (To disable
INTx). The remaining options should match those shown in
Figure~\ref{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabInterrupts}
% \begin{figure}[H]
%   \includegraphics[width=350px,center]{7SeriesIntegratedTabPowerManagement.png}
%   \label{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabPowerManagement}
%   \caption{PCI Express Power Manement Tab.}\\ 
% \end{figure}
% \begin{figure}[H]
%   \includegraphics[width=350px,center]{7SeriesIntegratedTabExtCapabilities.png}
%   \label{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabExtCapabilities}
% \end{figure}
% \begin{figure}[H]
%   \includegraphics[width=350px,center]{7SeriesIntegratedTabExtCapabilities2.png}
%   \label{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabExtCapabilities2}
% \end{figure}
% \begin{figure}[H]
%   \includegraphics[width=350px,center]{7SeriesIntegratedTabTLSettings.png}
%   \label{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabTLSettings}
% \end{figure} 
% \begin{figure}[H]
%   \includegraphics[width=350px,center]{7SeriesIntegratedTabDLPLSettings.png}
%   \label{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabDLPLSettings}
% \end{figure}
\begin{figure}[H]
  \includegraphics[width=350px,center]{7SeriesIntegratedTabSharedLogic.png}
  \caption{Shared Logic Tab}
  \label{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabSharedLogic}
\end{figure}
In the Shared Logic Tab shown in
Figure~\ref{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabSharedLogic}
\ConfigSetting{clear} all of the checkboxes shown. These settings will not affect the core
generated, but will affect the example designs generated by Vivado. As a result,
the Example Design will mirror the RIFFA example design provided.
\begin{figure}[H]
  \includegraphics[width=350px,center]{7SeriesIntegratedTabCoreInterfaceParameters.png}
  \caption{Core Interface Parameters Tab}
  \label{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabInterfaceParameters}
\end{figure}
Finally, in the Interface Parameters tab, match the checkboxes shown in
Figure~\ref{Fig:7SeriesIntegrated:Generating:7SeriesIntegratedTabInterfaceParameters}. These
options simplify the interface to the generated core

\subsection{Creating Constraints files for the VC707 Development Board}
\label{Sec:7SeriesIntegrated:Generating:Constraints:VC707}
When generating a design for the VC707 board, the following constraints will
correctly constrain the clocks. When using a different board, read the user
guide for appropriate pin placment, or copy the constraints from the PCIe
Endpoint Example Design. The remaining constraints are contained the generated PCIe IP.
\begin{lstlisting}[basicstyle=\footnotesize\ttfamily,language=tcl,
    commentstyle=\color{red},label=Listing:7SeriesIntegrated:Generating:Constraints:VC707,
    caption=\Xilinx{.xdc} constraints for the VC707 board,frame=single]
  set_property PACKAGE_PIN AV35 [get_ports PCIE_RESET_N]
  set_property IOSTANDARD LVCMOS18 [get_ports PCIE_RESET_N]
  set_property PULLUP true [get_ports PCIE_RESET_N]
  # The following constraints are BOARD SPECIFIC. This is for the VC707
  set_property LOC IBUFDS_GTE2_X1Y5 [get_cells refclk_ibuf]
  create_clock -period 10.000 -name pcie_refclk [get_pins refclk_ibuf/O]
  set_false_path -from [get_ports PCIE_RESET_N]
\end{lstlisting}

\subsection{Creating Constraints files for the ZC706 Development Board}
\label{Sec:7SeriesIntegrated:Generating:Constraints:ZC706}
When generating a design for the ZC706 board, the following constraints will
correctly constrain the clocks. When using a different board, read the user
guide for appropriate pin placment, or copy the constraints from the PCIe
Endpoint Example Design. The remaining constraints are contained the generated PCIe IP.
\begin{lstlisting}[basicstyle=\footnotesize\ttfamily,language=tcl,
    commentstyle=\color{red},label=Listing:7SeriesIntegrated:Generating:Constraints:ZC706,
    caption=\Xilinx{.xdc} constraints for the ZC706 board,frame=single]
  set_property IOSTANDARD LVCMOS15 [get_ports PCIE_RESET_N]
  set_property PACKAGE_PIN AK23 [get_ports PCIE_RESET_N]
  set_property PULLUP true [get_ports PCIE_RESET_N]
  # The following constraints are BOARD SPECIFIC. This is for the ZC706
  set_property LOC IBUFDS_GTE2_X0Y6 [get_cells refclk_ibuf]
  create_clock -period 10.000 -name pcie_refclk [get_pins refclk_ibuf/O]
  set_false_path -from [get_ports PCIE_RESET_N]
\end{lstlisting}


\pagebreak
\section{Ultrascale - Gen3 Integrated Block for PCI Express - (VC709 and newer)}
This is a step by step guide for building a RIFFA \RIFFAVer~reference design for
Xilinx FPGA's compatible with the Gen3 Integrated Block for PCI Express. In
RIFFA \RIFFAVer~there are three example designs for the VC709 board in the
\Directory{RIFFA \RIFFAVer/source/fpga/vc709} directory:
\VCSevenOhNineExampleDesignsLong. To use one of these example designs, follow
the instructions below.

\subsection{VC709 Example Designs}
\label{Sec:Gen3Integrated:ExampleDesign}
\begin{enumerate}
\item Open Vivado to get the introductory screen shown in
  Figure~\ref{Fig:Vivado:WelcomeScreen}.
\item Click 'Open an Existing Project' and navigate to your RIFFA
  \RIFFAVer~directory.
\item In the RIFFA \RIFFAVer~distribution, open \Directory{RIFFA
  \RIFFAVer/source/fpga/xilinx/vc709/} and choose from one of the existing
  example design directories for your board. In the example design directory,
  locate the \Directory{prj} folder and open it. Select the .xpr file and click
  open. This will open the example project, as shown in
  Figure~\ref{Fig:Gen3Integrated:ExampleDesign:ProjectOpened}.
\item This project was compiled in Vivado \VivadoVer. The bit file generated can
  be used to test the FPGA system. If you are using a newer version of Vivado,
  recompile the example design or use the programming file provided.
  \begin{itemize}
  \item IP Settings are now packaged as part of the example designs! Users no
    longer need to generate IP.
  \item To recompile the example design, click the generate bitstream button in
    the top left corner as shown in
    Figure~\ref{Fig:Gen3Integrated:ExampleDesign:ProjectOpened}.
  \item Recompiling your design will generate a new bitfile in the Xilinx
    project. The bit file in the \Directory{bit} will not be changed.
  \end{itemize}
\item To program the FPGA, click 'Open Hardware Manager'. New bit files
  (generated by Vivado) will appear in the Vivado generated directories. An
  example bit file is provided in the example design's \Directory{bit}. Load the
  bitstream to your VC709 board and restart your computer.
  \begin{itemize}
  \item Before programming your FPGA, you should install the RIFFA driver. See
    Section~\ref{Sec:RIFFA:Installation}
  \end{itemize}
\item The example design uses the chnl\_tester (shown in
  Figure~\ref{Fig:Vivado:ExampleDesign:chnl_tester}, which works with
  the example software in the \Directory{source/\{C\_C++,Java,python,matlab\}}
  directories. Replace the chnl\_tester instantiation with any user logic,
  matching the RIFFA interface.
\item Recompile the design and program the FPGA Device. Changing the
  \RIFFAParameter{C\_NUM\_CHNL} will change the number of independent channel
  interfaces
\end{enumerate}
\begin{figure}
  \includegraphics[width=300px,center]{Gen3IntegratedOpenProject.png}
  \caption{Project Splash Screen for Gen3 Integrated Block for PCI Express Projects}
  \label{Fig:Gen3Integrated:ExampleDesign:ProjectOpened}
\end{figure}
\subsection{Generating the Gen3 Integrated Block for PCI Express}
\label{Sec:Gen3Integrated:Generating}
The following steps are not required for general users. See the instructions
above for how to compile RIFFA.

Alternatively, it is possible to generate the PCIe Endpoint with different
settings than those provided in the example design. Changing the endpoint
settings is required when changing the parameters
\RIFFAParameter{C\_PCI\_DATA\_WIDTH}, \RIFFAParameter{C\_MAX\_PAYLOAD\_BYTES}
and \RIFFAParameter{C\_LOG\_NUM\_TAGS}. The \RIFFAParameter{C\_NUM\_LANES} is a
parameter in the top level file of each example project. How these parameters
relate to IP core settings is highlighted in the following figures.

If the goal is to generate a RIFFA design completely from scratch, each board
directory comes with a RIFFA wrapper verilog file and instantiates a
vendor-specific translation layer. It is highly recommended to re-use these
files RIFFA wrapper when creating designs from scratch.

To generate the PCIe IP select the 7 Series Integrated Block for PCI Express
after selecting the IP Catalog shown in
Figure~\ref{Fig:Gen3Integrated:ExampleDesign:ProjectOpened}. This will open the
IP Customization window as shown in
Figure~\ref{Fig:Gen3Integrated:Generating:Gen3IntegratedTabBasic}

\begin{figure}[H]
  \includegraphics[width=\figurewidth,center]{Gen3IntegratedTabBasic.png}
  \caption{Basic settings tab.}
  \label{Fig:Gen3Integrated:Generating:Gen3IntegratedTabBasic}
\end{figure}
First, select ``ADVANCED'' from the drop down menu. This will cause more tabs to
appear in the bar. The following tabs are not used during customization: MSIx Cap (Capabilities),
Extd. Capabilities 1, and Extd Capabilites 2.

In this example, we have set the \ConfigSetting{Xilinx Development Board} to
\ConfigSetting{VC709}, and selected the PCIe Gen1 rate of \ConfigSetting{2.5
  GT/s}, and a Lane Width of 8 (\RIFFAParameter{C\_NUM\_LANES} = 8). We have
chosen to set the \ConfigSetting{AXI Interface Width} to \ConfigSetting{64-bits}
(\RIFFAParameter{C\_PCI\_DATA\_WIDTH} = 64). Finally Clear the
\ConfigSetting{Disable Client Tag} and \ConfigSetting{PCIe DRP Ports} boxes. The
choice of \ConfigSetting{Link Rate}, \ConfigSetting{Lanes}, and
\ConfigSetting{Interface Width} will allow different AXI Interface Frequencies
to be selected. The RIFFA core will run at this clock frequency, but the user
logic can run at whatever frequency it desires.

Optional: Set the Component Name of the PCI Express block, and the IP
Location. In our example projects, we use the name template
PCIeGen\textbf{W}x\textbf{Y}If\textbf{Z} where \textbf{W} is the PCI Express
Version (\ConfigSetting{Link Speed} in
Figure~\ref{Fig:Gen3Integrated:Generating:Gen3IntegratedTabBasic}),
\textbf{Y} is the lane width, and \textbf{Z} is the AXI interface width.  The IP
location is the \Directory{ip} directory in the example project.

Note: For RIFFA \RIFFAVer~the 256-bit interface is not supported, however the
128-bit interface is. This means PCIe Gen2 with 8 lanes, and PCIe Gen3 with 4
lanes are both supported.

\begin{figure}[H]
  \includegraphics[width=350px,center]{Gen3IntegratedTabCapabilities.png}
  \caption{PCI Express Capabilities Tab.}
  \label{Fig:Gen3Integrated:Generating:Gen3IntegratedTabCapabilities}
\end{figure}
In the Capabilities tab shown in
Figure~\ref{Fig:Gen3Integrated:Generating:Gen3IntegratedTabCapabilities} check
the \ConfigSetting{Extended Tag Field} box. If the \ConfigSetting{Extended Tag
  Field} is selected \RIFFAParameter{C\_LOG\_NUM\_TAGS} = 8, otherwise
\RIFFAParameter{C\_LOG\_NUM\_TAGS} = 5. Set the \ConfigSetting{PFO Max Payload
  Size} from the dropdown menu; Use this to set the RIFFA
\RIFFAParameter{C\_MAX\_PAYLOAD\_BYTES} parameter.

Note: Maximum Payload sizes are typically set by the BIOS, and 256 bytes seems
to be standard. RIFFA will default to the minimum of
\RIFFAParameter{C\_MAX\_PAYLOAD\_SIZE} and the setting in your BIOS. Unless your
BIOS is modified, or can support substantially larger packets, there will be no
performance benefit to increasing the payload size. Increasing the maximum
payload size will increase the resources consumed.

\begin{figure}[H]
  \includegraphics[width=350px,center]{Gen3IntegratedTabPF0Ids.png}
  \caption{PCI Express IDs Tab.}
  \label{Fig:Gen3Integrated:Generating:Gen3IntegratedTabPF0Ids}
\end{figure}
The tab in Figure~\ref{Fig:Gen3Integrated:Generating:Gen3IntegratedTabPF0Ids}
is optional. Setting the Device ID may assist in identifying different FPGAs in
a multi-FPGA system. The other options, specifically the Vendor ID, must remain
the same.
\begin{figure}[H]
  \includegraphics[width=350px,center]{Gen3IntegratedTabPF0Bar.png}
  \caption{PCI Express Base Address Registers (BAR) Tab.}
  \label{Fig:Gen3Integrated:Generating:Gen3IntegratedTabPF0Bar}
\end{figure}
The tab in Figure~\ref{Fig:Gen3Integrated:Generating:Gen3IntegratedTabPF0Bar}
must be configured so that BAR0 is \ConfigSetting{enabled}. Select type
\ConfigSetting{Memory}, and Unit \ConfigSetting{Kilobyte}, and
\ConfigSetting{Size Value} \ConfigSetting{1} from the dropdown menus. If these
values are not set correctly the RIFFA driver will not recognize the FPGA
device.
\begin{figure}[H]
  \includegraphics[width=350px,center]{Gen3IntegratedTabLegacyMSICap.png}
  \caption{PCI Express Legacy and MSI Interrupts Tab.}
  \label{Fig:Gen3Integrated:Generating:Gen3IntegratedTabLegacyMSICap}
\end{figure}
In the Legacy/MSI Capabilites tab shown in
Figure~\ref{Fig:Gen3Integrated:Generating:Gen3IntegratedTabLegacyMSICap}, select
\ConfigSetting{None} in the \ConfigSetting{PFO Interrupt Pin Dropdown} menu and
set the \ConfigSetting{PFO Multiple Message Capable} dropdown menu to
\ConfigSetting{1 Vector}
% \begin{figure}[H]
%   \includegraphics[width=350px,center]{Gen3IntegratedTabMSIxCap.png}
%   \label{Fig:Gen3Integrated:Generating:Gen3IntegratedTabMSIxCap}
% \end{figure} 
\begin{figure}[H]
  \includegraphics[width=350px,center]{Gen3IntegratedTabPowerManagement.png}
  \caption{PCI Express Power Management Tab.}
  \label{Fig:Gen3Integrated:Generating:Gen3IntegratedTabPowerManagement}
\end{figure}
In the Power Management tab, shown in
Figure~\ref{Fig:Gen3Integrated:Generating:Gen3IntegratedTabPowerManagement},
ensure that the \ConfigSetting{Performance Level} is set to
\ConfigSetting{Extreme}.
% \begin{figure}[H]
%   \includegraphics[width=350px,center]{Gen3IntegratedTabExtCapabilities1.png}
%   \label{Fig:Gen3Integrated:Generating:Gen3IntegratedTabExtCapabilities1}
% \end{figure}
% \begin{figure}[H]
%   \includegraphics[width=350px,center]{Gen3IntegratedTabExtCapabilities2.png}
%   \label{Fig:Gen3Integrated:Generating:Gen3IntegratedTabExtCapabilities2}
% \end{figure}
\begin{figure}[H]
  \includegraphics[width=350px,center]{Gen3IntegratedTabSharedLogic.png}
  \caption{PCI Express Shared Logic Tab.}
  \label{Fig:Gen3Integrated:Generating:Gen3IntegratedTabSharedLogic}
\end{figure}
In the Shared Logic Tab shown in
Figure~\ref{Fig:Gen3Integrated:Generating:Gen3IntegratedTabSharedLogic}
\ConfigSetting{clear} all of the checkboxes shown. These settings will not
affect the core generated, but will affect the example designs generated by the
Vivado, and make the Vivado example design mirror the RIFFA Example design.
\begin{figure}[H]
  \includegraphics[width=350px,center]{Gen3IntegratedTabCoreInterfaceParameters.png}
  \caption{PCI Express Core Interface Parameters Tab.}
  \label{Fig:Gen3Integrated:Generating:Gen3IntegratedTabCoreInterfaceParameters}
\end{figure}
Finally, in the Interface Parameters tab, match the checkboxes shown in
Figure~\ref{Fig:Gen3Integrated:Generating:Gen3IntegratedTabCoreInterfaceParameters}. These
options simplify the interface to the generated core
\subsection{Creating Constraints files for the VC709 Development Board}
\label{Sec:Gen3Integrated:Generating:Constraints}
When generating a design for the VC709 board, the following constraints will
correctly constrain the clocks. When using a different board, read the user
guide for appropriate pin placment, or copy the constraints from the PCIe
Endpoint Example Design.
\begin{lstlisting}[basicstyle=\footnotesize\ttfamily,language=tcl,
    commentstyle=\color{red},label=Listing:7SeriesIntegrated:Generating:Constraints:VC709,
    caption=\Xilinx{.xdc} constraints for the VC709 board,frame=single]
  create_clock -period 10.000 -name pcie_refclk [get_pins refclk_ibuf/O]
  set_false_path -from [get_ports PCIE_RESET_N]

  # The following constraints are BOARD SPECIFIC. This is for the VC709
  set_property LOC IBUFDS_GTE2_X1Y11 [get_cells refclk_ibuf]
  set_property PACKAGE_PIN AV35 [get_ports PCIE_RESET_N]
  set_property IOSTANDARD LVCMOS18 [get_ports PCIE_RESET_N]
  set_property PULLUP true [get_ports PCIE_RESET_N]
\end{lstlisting}

\pagebreak
\chapter{Compiling and using the Altera Example Designs}
\label{Chap:Altera}
This section describes how to use RIFFA \RIFFAVer~ with Quartus \QuartusVer. The
example projects included in this distribution target Terasic DE5Net and DE4
boards. We are confident that RIFFA will work on all currently supported Altera
devices using the Hard IP for PCI Express (Cyclone V, Arria V and Stratix V)
devices, as well as all devices using IP Compiler for PCI Express (Stratix IV
and prior). For device support in Quartus \QuartusVer
see \footnote{http://dl.altera.com/devices/}

The FPGA families that we have successfully tested RIFFA \RIFFAVer~are:
\begin{itemize}
\item Stratix V (DE5-Net)
\item Stratix IV (DE4)
  %\item Cyclone IV (DE2i-150)
\end{itemize}
There are three options for starting a new RIFFA project:
\begin{itemize}
\item For first-time users with a DE5 board, we recommend the archived projects
  provided in the \Directory{RIFFA \RIFFAVer/source/fpga/de5\_qsys} directory. Follow the
  instructions in Section~\ref{Sec:Altera:QsysMegawizard:Qsys}
\item Intermediate and advanced users, or users with a DE4 board, we have
  provided projects without instantiated IP. For DE5 boards, follow the
  instructions in Section~\ref{Sec:Altera:QsysMegawizard:Megawizard}. For DE4 boards, follow
  the instructions in Section~\ref{Sec:Altera:IPCompiler}
\item For advanced users, or users wishing to support a new board, we provide
  full instructions for creating a top level and generating IP. Follow the
  instructions in Section~\ref{Sec:Altera:QsysMegawizard}
\end{itemize}

\section{Example Designs with Qsys and MegaWizard (Stratix V, Cyclone V and newer)}
\label{Sec:Altera:QsysMegawizard}
\subsection{Qsys (Stratix V and newer)}
\label{Sec:Altera:QsysMegawizard:Qsys}
For first-time users with the DE5-Net board, copy one of the archived projects
(.qar files) available in the \Directory{de5\_qsys} directory.

\begin{enumerate}
\item Open Quartus to get the introductory screen shown in
  Figure~\ref{Fig:Quartus:WelcomeScreen}.
\item Click 'Open an Existing Project' and navigate to your RIFFA
  \RIFFAVer~directory.
\item In the RIFFA \RIFFAVer~distribution, open \Directory{RIFFA
  \RIFFAVer/source/fpga/de4/} and choose from one of the existing example design
  directories for your board. In the example design directory, locate the
  \Directory{prj} folder and open it. Select the .qpf file and click open. This
  will open the example project, as shown in
  Figure~\ref{Fig:Quartus:ExampleDesign:ProjectOpened}.
\item This project was compiled in Quartus \QuartusVer. The bit file generated
  can be used to test the FPGA system. If you are using a newer version of
  Quartus, recompile the example design or use the programming file provided.
  \begin{itemize}
  \item To recompile the example design, click the compile button in the top
    left corner as shown in
    Figure~\ref{Fig:Quartus:ExampleDesign:ProjectOpened}.
  \item Recompiling your design will generate a new bitfile in the
    \Directory{prj} directory. The bit file in the \Directory{bit} will not be
    changed.
  \end{itemize}
\item To program the FPGA, click 'Open Programmer'. New bit files (generated by
  Quartus) will appear in the \Directory{prj/output\_files/} directory. An example
  bit file is provided in the example design's \Directory{bit} directory.
  \begin{itemize}
  \item Before programming your FPGA, you should install the RIFFA driver. See
    Section~\ref{Sec:RIFFA:Installation}
  \end{itemize}
\item The example design uses the chnl\_tester (shown in
  Figure~\ref{Fig:Quartus:ExampleDesign:chnl_tester}, which works with
  the example software in the \Directory{source/\{C\_C++,Java,python,matlab\}}
  directories. Replace the chnl\_tester instantiation with any user logic,
  matching the RIFFA interface.
\item Recompile the design and program the FPGA Device. Changing the
  \RIFFAParameter{C\_NUM\_CHNL} will change the number of independent channel
  interfaces
\end{enumerate}
\begin{figure}
  \includegraphics[width=300px,center]{QuartusWelcomeScreen.png}
  \caption{Welcome Screen for Quartus \QuartusVer}
  \label{Fig:Quartus:WelcomeScreen}
\end{figure}
\begin{figure}
  \includegraphics[width=300px,center]{IPCompileOpenProject.png}
  \caption{Project Splash Screen for Quartus Projects}
  \label{Fig:Quartus:ExampleDesign:ProjectOpened}
\end{figure}
\begin{figure}
  \includegraphics[width=300px,trim=400 200 300 200, clip=true,center]{QuartusChnlTesterInstantiation.png}
  \caption{chnl\_tester instantiation in the top level file}
  \label{Fig:Quartus:ExampleDesign:chnl_tester}
\end{figure}

\subsection{Generating IP using MegaWizard (Stratix V, Cyclone V and newer)}
\label{Sec:Altera:QsysMegawizard:Megawizard}
In some cases, it may be necessary to generate the PCIe Endpoint IP. For
intermediate users, there are project example projects inside of the
\Directory{de5} directory without instantiated IP (This is done to avoid
licensing problems). For the DE5, the project directories are: DE5Gen1x8If64,
DE5Gen2x8If128, DE5Gen3x4If128.

Modifying the RIFFA parameters \RIFFAParameter{C\_PCI\_DATA\_WIDTH},
\RIFFAParameter{C\_MAX\_PAYLOAD\_BYTES} and \RIFFAParameter{C\_LOG\_NUM\_TAGS}
require changing certain settings in the IP core file. The paramter
\RIFFAParameter{C\_NUM\_LANES} is located in the top level file of each example
project. How these parameters relate to IP core settings is highlighted in the
following figures.

For advanced users whose goal is to generate a RIFFA design completely from
scratch, we provide instructions for generating the timing constraints and other
low level details. Each board directory contains with a RIFFA wrapper verilog
file and instantiates a vendor-specific translation layer. It is highly
recommended to re-use these files RIFFA wrapper when creating designs from
scratch. Users should also use the constraints file (\Altera{.sdc}) in the board
directory, and in the \Directory{constr/}, or read the User Guide provided with
each board and the instructions for generating constratints in
Section~\ref{Sec:Altera:QsysMegawizard:Constraints}.

As stated in Section~\ref{Sec:Intro:Decoding}, each project directory contains five folders.
\begin{itemize}
\item The \Directory{prj/} directory contains the project \Altera{.qpf} and \Altera{.qsf} file.
\item The \Directory{hdl/} contains the top level file, e.g. DE5Gen2x8If128.v, which instantiates the skeleton IP and the RIFFA Core.
\item The \Directory{ip/} directory is empty but will contain Altera IP generated by Quartus
  in the following guide.
\item The \Directory{constr/} directory contains project-specific timing constraint files.
\item Finally the \Directory{bit/} directory contains the project .sof, or bit
  file that we have tested. This bitfile will not be overwritten by subsequent Quartus compilations.
\end{itemize}

Note: The bitfile in the bit directory is not modified by recompilation in
Quartus. Quartus will generate a new bitfile (.sof) in the \Directory{prj/}
directory for the DE5Net board.

\begin{figure}[H]
  \includegraphics[width=350px,center]{QsysPCIExpSystemTrim.png}
  \caption{Qsys Diagram depicting the connections between the three Altera IP blocks.}
  \label{Fig:Altera:QsysMegawizard:Megawizard:QsysPCIExpSystemTrim}
\end{figure}

Altera designs require additional IP to drive the PCIe Core Transcievers. For the
DE5, these blocks are the Transciever Reconfiguration Controller and the
Reconfiguration Driver. When creating a new top level design, these blocks must be
connected together with the PCIe Endpoint as shown in
Figure~\ref{Fig:Altera:QsysMegawizard:Megawizard:QsysPCIExpSystemTrim}.

First, we will generate the PCIe Endpoint. Click on the Avalon Streaming
Interface for PCI Express in the Quartus IP Catalog.
Figure~\ref{Fig:Altera:IPCompiler:PCIeSystemSettingsTab}.

Optional: Set the Component Name of the PCI Express block, and the IP
Location. In our example projects, we typically use the name
PCIeGen\textbf{W}x\textbf{Y}If\textbf{Z} where \textbf{W} is the PCI Express
Version (\ConfigSetting{Link Speed} in
Figure~\ref{Fig:Gen3Integrated:Generating:Gen3IntegratedTabBasic}), \textbf{Y}
is the lane width, and \textbf{Z} is the Avalon interface width. The IP location
is the \Directory{ip/} directory in the example project.

\begin{figure}[H]
  \includegraphics[width=350px,center]{PCIExpressEndpoint1Trim.png}
  \caption{PCI Express Endpoint Configuration Menu}
  \label{Fig:Altera:QsysMegawizard:Megawizard:PCIExpressEndpoint1Trim}
\end{figure}

In Figure~\ref{Fig:Altera:QsysMegawizard:Megawizard:PCIExpressEndpoint1Trim},
select the \ConfigSetting{Number of Lanes}, which corresponds to the top level
parameter \RIFFAParameter{C\_NUM\_LANES}, \ConfigSetting{Lane Rate}, and
\ConfigSetting{PCI Express Base Specification} version from the dropdown menus
(Choose the highest possible base specification version). Select an
\ConfigSetting{Application Interface Width}; This corresponds to the
\RIFFAParameter{C\_PCI\_DATA\_WIDTH} parameter in RIFFA. Currently the
\ConfigSetting{64-bit} and \ConfigSetting{128-bit} interfaces are supported for
all Altera designs. Some widths may not be possible depending on the
\ConfigSetting{Lane Rate} and \ConfigSetting{Number of Lanes} selected.

The choice of \ConfigSetting{Link Rate}, \ConfigSetting{Number of Lanes}, and
\ConfigSetting{Interface Width} will set the frequency for the PCI interface,
which is clocked by the pld\_clk signal. For the chosen settings, the frequency
should be displayed in the messages bar at the bottom of the configuration menu
(Messages bar not shown). The RIFFA core will run at this clock frequency, but the
user logic can run at whatever frequency it desires.

In the Base Address Registers Section set BAR0's type to \ConfigSetting{32-bit
  non-prefetchable memory} and set the size to \ConfigSetting{1 KByte - 10 Bits}.

There are no required changes in the Device Identification Registers
Section. However, in a multiple FPGA system, it may be useful to change the
\ConfigSetting{Device ID} to allow identification of different FPGA
platforms. The other options, specifically the \ConfigSetting{Vendor ID}, must
remain the same.

Scroll down to view the final two sections shown in
Figure~\ref{Fig:Altera:QsysMegawizard:Megawizard:PCIExpressEndpoint2Trim}.
\begin{figure}[H]
  \includegraphics[width=350px,trim=0 0 0 620, clip=true,center]{PCIExpressEndpoint2Trim.png}
  \caption{PCI Express Endpoint Configuration Menu}
  \label{Fig:Altera:QsysMegawizard:Megawizard:PCIExpressEndpoint2Trim}
\end{figure}
In the PCI Express/PCI Capabilities menu, set your desired
\ConfigSetting{Maximum Payload Size}, which corresponds to the RIFFA parameter,
\RIFFAParameter{C\_MAX\_PAYLOAD\_BYTES} and the \ConfigSetting{Number of Tags
  Supported}. The log of the \ConfigSetting{Number of Tags Supported} is
the \RIFFAParameter{C\_LOG\_NUM\_TAGS} parameter in RIFFA.

In the MSI Tab, make sure that the number of MSI messages requested is equal to
1.

Note: Maximum Payload sizes are typically set by the BIOS, and 256 bytes seems
to be standard. RIFFA will default to the minimum setting
\RIFFAParameter{C\_MAX\_PAYLOAD\_SIZE} and the setting in your BIOS. Unless your
BIOS is modified, or can support substantially larger packets, there will be no
performance benefit to increasing the payload size. Increasing the
\ConfigSetting{Maximum Payload Size} will increase the resources consumed.

Finally, record the number of \ConfigSetting{Transciever Reconfiguration
  Interfaces} in the messages bar at the bottom of the screen, then close the
PCIe IP Generation Menu. A window may ask if you wish to generate the example
design. This is optional.
\begin{figure}[H]
  \includegraphics[width=350px,center]{XCVRMenuTrim.png}
  \caption{Transciever Reconfiguration IP Generation Menu}
  \label{Fig:Altera:QsysMegawizard:Megawizard:XCVRMenuTrim}
\end{figure}
Next, generate the Transceiver Reconfiguration Controller by opening MegaWizard
and selecting the Transceiver Reconfiguration Controller megafunction.

Set the appropriate number of \ConfigSetting{Transciever Reconfiguration
  Interfaces} in the Interface Bundles Menu. In the analog features section,
\ConfigSetting{Enable Analog Controls} and \ConfigSetting{Enable Adaptive
  Equalization} block by clicking the appropriate boxes.

Optional: Set the Component Name of the Transciever Reconfiguration Controller, and the IP
Location. In our example projects, we typically use the name
XCVRCtrlGen\ConfigSetting{W}x\ConfigSetting{Y} where \ConfigSetting{W} is the \ConfigSetting{PCI Express Version}
(\ConfigSetting{Link Speed} in
Figure~\ref{Fig:Altera:QsysMegawizard:Megawizard:PCIExpressEndpoint1Trim}), \ConfigSetting{Y}
is the lane width. The IP location is the \Directory{ip/} directory in the
example project.

\begin{figure}[H]
  \includegraphics[width=350px,center]{ReconfigDriverMenu.png}
  \caption{Transciever Reconfiguration Driver Menu}
  \label{Fig:Altera:QsysMegawizard:Megawizard:ReconfigDriverMenu}
\end{figure}

In Qsys, generate the Transciver Reconfiguration Controller. Select the
appropriate lane rate for your design, and the number of Reconfiguration
Intefaces. These should match the number of reconfiguration interfaces dictated
when generating the PCIe IP in
Figure~\ref{Fig:Altera:QsysMegawizard:Megawizard:PCIExpressEndpoint1Trim} and
the number selected in
Figure~\ref{Fig:Altera:QsysMegawizard:Megawizard:XCVRMenuTrim}.

Optional (in Qsys): Set the Component Name of the Transciever Reconfiguration
Driver, and the IP Location. In our example projects, we typically use the name
XCVRDrvGen\ConfigSetting{W}x\ConfigSetting{Y} where \ConfigSetting{W} is the
\ConfigSetting{PCI Express Version} (\ConfigSetting{Link Speed} in
Figure~\ref{Fig:Gen3Integrated:Generating:Gen3IntegratedTabBasic}), and
\textbf{Y} is the lane width. The IP location is the \Directory{ip} directory in
the example project.

If you are using Megawizard to instantiate IP, you must manually instantiate the
Transciver Reconfiguration Driver in the Top Level design. The instantiation
template is shown in
Listing~\ref{Listing:Altera:QsysMegawizard:ManualDriverInst} into your top-level
file.  Match the PCIe generation, and chip generation for your project.
\begin{lstlisting}[basicstyle=\footnotesize\ttfamily,language=Verilog,
    commentstyle=\color{red},label=Listing:Altera:QsysMegawizard:ManualDriverInst,
    caption=Manual (non-qsys) instantiation of Reconfiguration Driver,frame=single]
  altpcie_reconfig_driver
  #(
  /* These values should match the values used for the PCIe Endpoint */
  .number_of_reconfig_interfaces(10), /*Set This*/
  .gen123_lane_rate_mode_hwtcl(``Gen1 (2.5 Gbps)''), /*Set This*/
  .INTENDED_DEVICE_FAMILY(``Stratix V'')) /*Set This*/
  XCVRDriverGen2x8_inst
  (
  /*Ports Here -- Copy from Example Designs*/
  );
\end{lstlisting}

\pagebreak
\subsection{Creating Constraints files for MegaWizard and QSys Designs}
\label{Sec:Altera:QsysMegawizard:Constraints}
Advanced users may also want to edit and modify the constraint files. This not
required or recommended for novice users. The example designs in the RIFFA
\RIFFAVer~distribution contain appropriate constraint files for the example
designs. However if the need arises, these constraints are documented below.

To appropriately constrain your PCIe reference clocks, place the constraints
shown in Listing~\ref{Listing:Altera:QsysMegawizard:Constraints} in your
\Altera{.sdc} file. Modify the names PCIE\_REFCLK, PCIE\_TX\_OUT and
PCIE\_RX\_IN to match your design.
\begin{lstlisting}[basicstyle=\footnotesize\ttfamily,language=tcl,
    commentstyle=\color{red},label=Listing:Altera:QsysMegawizard:Constraints,
    caption=\Altera{.sdc} constraints for Qsys and Megawizard designs,frame=single]
  create_clock -name PCIE_REFCLK -period 10.000 [get_ports {PCIE_REFCLK}]
  derive_pll_clocks -create_base_clocks
  derive_clock_uncertainty
\end{lstlisting}
Likewise, copy the constraints in
Listing~\ref{Listing:Altera:QsysMegawizard:QSFSettings} into your .qsf file. Copy
the location assignment commands for each PCIe Pin in your reference design.

\begin{lstlisting}[language=tcl,basicstyle=\footnotesize\ttfamily,commentstyle=\color{red},
    label=Listing:Altera:QsysMegawizard:QSFSettings,
    caption=\Altera{.qsf} settings for Qsys and Megawizard designs,frame=single]
  ###################################################################
  # PCIE Connections
  ###################################################################
  set_location_assignment <PCIE_REFCLK_PIN> -to PCIE_REFCLK
  set_instance_assignment -name IO_STANDARD HCSL -to PCIE_REFCLK
  set_location_assignment <PCIE_REFCLK_PIN(n)> -to ``PCIE_REFCLK(n)''
  set_instance_assignment -name IO_STANDARD HCSL -to ``PCIE_REFCLK(n)''
  set_location_assignment <PCIE_RESET_N> -to PCIE_RESET_N
  set_instance_assignment -name IO_STANDARD ``2.5 V'' -to PCIE_RESET_N

  # For each PCIE Lane (L) set the pin locations from the board user guide!
  ######################################################################
  #PCIE TX_OUT L
  ######################################################################
  set_location_assignment <TX_LANE[L]_PIN>   -to PCIE_TX_OUT[0]
  set_location_assignment <TX_LANE[L]_PIN(n)> -to ``PCIE_TX_OUT[0](n)''

  ###################################################################
  #PCIE RX_IN L
  ###################################################################
  set_location_assignment <RX_LANE[L]_PIN>  -to PCIE_RX_IN[L]
  set_location_assignment <RX_LANE[L]_PIN(n)> -to ``PCIE_RX_IN[L](n)''
\end{lstlisting}
\pagebreak

\section{IP Compiler for PCI Express (Stratix IV, and older)}
\label{Sec:Altera:IPCompiler}
To avoid licensing problems, we do not package Altera IP for the DE4
board. Manual IP Instantiation is required when using the DE4 Board, and similar
devices using the IP Compiler for PCI Express. Changing the endpoint settings
described here may change the RIFFA para\-meters
\RIFFAParameter{C\_PCI\_DATA\_WIDTH}, \RIFFAParameter{C\_MAX\_PAYLOAD\_BYTES}
and \RIFFAParameter{C\_LOG\_NUM\_TAGS}. How these parameters relate to IP core
settings is highlighted in the following figures.

There are sever example projects inside of the \Directory{de4} directory folder
without instantiated IP. For the DE4 board, these
projects are DE4Gen1x8If64, DE4Gen2x8If128.% and DE2Gen1x1If64 for the DE2i-150
board.

As stated in Section~\ref{Sec:Intro:Decoding}, each project directory contains
five folders.
\begin{itemize}
\item The \Directory{prj/} directory contains the project \Altera{.qpf} and \Altera{.qsf} file.
\item The \Directory{hdl/} contains the top level file, e.g. DE5Gen2x8If128.v,
  which instantiates the skeleton IP and the RIFFA Core.
\item The \Directory{ip/} directory is empty but will contain Altera IP generated by Quartus
  in the following guide.
\item The \Directory{constr/} directory contains project-specific timing constraint files.
\item Finally the \Directory{bit/} directory contains the project .sof, or bit
  file that we have tested. This bitfile will not be overwritten by subsequent
  Quartus compilations.
\end{itemize}

\subsection{Generating IP with IP Compiler for PCI Express (Stratix IV, and older)}
\label{Sec:Altera:IPCompiler:Generating}
Note: The bitfile in the bit directory is not modified by recompilation in
Quartus. Quartus will generate a new bitfile (.sof) in the
\Directory{/output\_files} directory for the DE4 board.

First, we will generate the PCIe Endpoint. Open the Altera IP Catalog and select
the IP Compiler for PCI Express. This will open the window shown in
Figure~\ref{Fig:Altera:IPCompiler:PCIeSystemSettingsTab}.

Optional: Set the Component Name of the PCI Express block, and the IP
Location. In our example projects, we typically use the name
PCIeGen\ConfigSetting{W}x\ConfigSetting{Y}If\ConfigSetting{Z} where \ConfigSetting{W} is the PCI Express
Version (Link Speed in
Figure~\ref{Fig:Gen3Integrated:Generating:Gen3IntegratedTabBasic}), \ConfigSetting{Y}
is the lane width, and \ConfigSetting{Z} is the Avalon interface width. The IP location
is the \Directory{ip} directory in the example project.

In this guide, we will skip the Power Mangement tab shown in
Figure~\ref{Fig:Altera:IPCompiler:PCIeSystemSettingsTab}.
\begin{figure}[H]
  \includegraphics[width=350px,center]{IPCompilerPCIeTabSystemSettings.png}
  \caption{IP Compiler for PCI Express System Settings Tab}
  \label{Fig:Altera:IPCompiler:PCIeSystemSettingsTab}
\end{figure}
In the first column of the System Settings Tab, select your \ConfigSetting{Chip
  Generation/PHY Type} (\ConfigSetting{Stratix IV GX} for the DE4 board),
\ConfigSetting{Lanes}, and \ConfigSetting{Max Rate}. The \ConfigSetting{Number
  of Lanes} is the parameter \RIFFAParameter{C\_NUM\_LANES} in the project top
level file. In the second column, select the \ConfigSetting{PCI Express Version}
(2.0, or the highest possible) and set the \ConfigSetting{Test Out Width} to
\ConfigSetting{0}. In the third column, select the \ConfigSetting{Application Interface
  Width}. The \ConfigSetting{Application Interface Width} corresponds to the RIFFA parameter
\RIFFAParameter{C\_PCI\_DATA\_WIDTH}.

The choice of \ConfigSetting{Link Rate}, \ConfigSetting{Lanes}, and
\ConfigSetting{Interface Width} will set the frequency for the PCI interface,
which is clocked by the signal pld\_clk. For the chosen settings, the frequency
is determined in the Chip User Guide (though, typically it is one of 62.5, 125,
or 256 MHz)

\begin{figure}[H]
  \includegraphics[width=350px,center]{IPCompilerPCIeTabPCIRegisters.png}
  \caption{IP Compiler for PCI Express Registers Tab}
  \label{Fig:Altera:IPCompiler:PCIRegisters}
\end{figure}
PCI Registers Tab, shown in Figure~\ref{Fig:Altera:IPCompiler:PCIRegisters}, set
BAR0's type to ``32-bit non-prefetchable memory''. Set the size to ``1 KByte -
10 Bits''.

There are no required changes in the PCI Registers Tab. However, in a multiple
FPGA system, it may be useful to change the \ConfigSetting{Device ID} to
identify different FPGA platforms. The other options, specifically the
\ConfigSetting{Vendor ID}, must remain the same.
\begin{figure}[H]
  \includegraphics[width=350px,center]{IPCompilerPCIeTabCapabilities.png}
  \caption{IP Compiler for PCI Express Capabilities Tab}
  \label{Fig:Altera:IPCompiler:PCIeCapabilitiesTab}
\end{figure}
Open the Capabilities Tab shown in
Figure~\ref{Fig:Altera:IPCompiler:PCIeCapabilitiesTab}. In the Device
Capabilities box, set the \ConfigSetting{Tags Supported} to
\ConfigSetting{64}. The log of the maximum number of tags supported is the RIFFA
parameter \RIFFAParameter{C\_LOG\_NUM\_TAGS} parameter in RIFFA. In the MSI
Capabilities box, set the number of \ConfigSetting{MSI Messages Requested} to
\ConfigSetting{1}. All the remaining settings must stay the same.
\begin{figure}[H]
  \includegraphics[width=350px,center]{IPCompilerPCIeTabBufferSetup.png}
  \caption{IP Compiler for PCI Express Buffer Setup Tab}
  \label{Fig:Altera:IPCompiler:PCIeBufferSetupTab}
\end{figure}
In Figure~\ref{Fig:Altera:IPCompiler:PCIeBufferSetupTab} select the
\ConfigSetting{Maximum Payload Size} from the dropdown menu. Use this to set the
\RIFFAParameter{C\_MAX\_PAYLOAD} parameter. Set the numer of \ConfigSetting{Virtual Channels} to
\ConfigSetting{1}.

Note: \ConfigSetting{Maximum Payload} sizes are typically set by the BIOS, and
256 bytes seems to be standard. RIFFA will default to the minimum of
\RIFFAParameter{C\_MAX\_PAYLOAD\_SIZE} and the setting in your BIOS. Unless your
BIOS is modified, or can support substantially larger packets, there will be no
performance benefit to increasing the payload size. Increasing the
\ConfigSetting{Maximum Payload} size will increase the resources consumed.

Next, we need to generate the PLL for the example design. Select the ALTPLL
megafunction from the Quartus IP Catalog, to open the window shown in
Figure~\ref{Fig:Altera:ALTPLL:TabGeneral}.
\begin{figure}[H]
  \includegraphics[width=350px,center]{ALTPLLTabGeneral.png}
  \caption{ALTPLL General Settings Tab}
  \label{Fig:Altera:ALTPLL:TabGeneral}
\end{figure}
In Figure~\ref{Fig:Altera:ALTPLL:TabGeneral}, select the \ConfigSetting{Speed
  Grade} that matches your board (Found in the User Guide and online). Next set
the input clock frequency. The DE4 board provides 50 MHz clock inputs and we use
these for convenience. The remaining settings are unchaged. Click on the
Inputs/Lock tab to move on to Figure~\ref{Fig:Altera:ALTPLL:TabInputs}.

Optional: Set the name of the ALTPLL block. In the example designs we use the
name ALTPLL50I50O125O250O, for 50 MHz Input clock, 50, 125, and 250 MHz output
clocks. The 125 and 50 MHz Clocks are required for the PCIe Endpoint.

\begin{figure}[H]
  \includegraphics[width=350px,center]{ALTPLLTabInputs.png}
  \caption{ALTPLL Input Settings Tab}
  \label{Fig:Altera:ALTPLL:TabInputs}
\end{figure}
Match the settings shown in Figure~\ref{Fig:Altera:ALTPLL:TabInputs}. In the
Output Clocks Section, create a 50 MHz output clock, 125 MHz clock, and 250 MHz
Clock. Click Finish when done.

% \begin{figure}[H]
%   \includegraphics[width=350px,center]{ALTPLLTabBandwidth.png}
%   \caption{ALTPLL Bandwidth Tab}
%   \label{Fig:Altera:ALTPLL:TabBandwidth}
% \end{figure}
% \begin{figure}[H]
%   \includegraphics[width=350px,center]{ALTPLLTabSwitchover.png}
%   \caption{ALTPLL Switchover Tab}
%   \label{Fig:Altera:ALTPLL:TabSwitchover}
% \end{figure}

Finally, we generate the ALTGX\_RECONFIG Megafunction. Select the
ALTGX\_RECONFIG megafunction from the Quartus IP Catalog to produce the widown
shown in Figure~\ref{Fig:Altera:ALTGX:TabSettings}.
\begin{figure}[H]
  \includegraphics[width=350px,center]{ALTGXReconfigTabSettings.png}
  \caption{ALTGX Reconfiguration Settings Tab}
  \label{Fig:Altera:ALTGX:TabSettings}
\end{figure}
In the Reconfiguration Tab shown in Figure~\ref{Fig:Altera:ALTGX:TabSettings}, set the \ConfigSetting{Number of Channels}. This should be equal to the number of PCIe Lanes at the Top Level. In the Features Section, \ConfigSetting{Enable Analog Controls}. Match the settings in the remaining windows, shown in Figure~\ref{Fig:Altera:ALTGX:TabInputs},Figure~\ref{Fig:Altera:ALTGX:TabChannel}, and Figure~\ref{Fig:Altera:ALTGX:TabError}.

\begin{figure}[H]
  \includegraphics[width=350px,center]{ALTGXReconfigTabAnalog.png}
  \caption{ALTGX Reconfiguration Analog Settings Tab}
  \label{Fig:Altera:ALTGX:TabInputs}
\end{figure}

\begin{figure}[H]
  \includegraphics[width=350px,center]{ALTGXReconfigTabChannel.png}
  \caption{ALTGX Reconfiguration Channel Tab}
  \label{Fig:Altera:ALTGX:TabChannel}
\end{figure}

\begin{figure}[H]
  \includegraphics[width=350px,center]{ALTGXReconfigTabError.png}
  \caption{ALTGX Reconfiguration Error Tab}
  \label{Fig:Altera:ALTGX:TabError}
\end{figure}

\pagebreak
\subsection{Creating Constraints files for IP Compiler Designs}
Advanced users may also want to edit and modify the constraint files. This not
required or recommended for novice users. The example designs in the RIFFA
\RIFFAVer~distribution contain appropriate constraint files for the example
designs. However if the need arises, we demonstrate the constraints we used
below.

If the goal is to generate a RIFFA design completely from scratch, each board
directory comes with a RIFFA wrapper verilog file and instantiates a
vendor-specific translation layer. It is highly recommended to re-use these
files RIFFA wrapper when creating designs from scratch. Users should also use
the constraints file (\Altera{.sdc}) in the board directory, and in the
\Directory{constr/}, or read the User Guide provided with each board.

To appropriately constrain your PCIe reference clocks, place the constraints
shown in Listing~\ref{Listing:Altera:IPCompiler:Constraints} in your
\Altera{.sdc} file. Modify the name of the osc\_50MHz and PCIE\_REFCLK ports to
match your design
\begin{lstlisting}[language=tcl,basicstyle=\footnotesize\ttfamily,commentstyle=\color{red},
    label=Listing:Altera:IPCompiler:Constraints,
    caption=\Altera{.sdc} constraints for Qsys and Megawizard designs,frame=single]
  create_clock -name PCIE_REFCLK -period 10.000 [get_ports {PCIE_REFCLK}]
  create_clock -name osc_50MHz -period 20.000 [get_ports {OSC_BANK3D_50MHZ}]
  derive_pll_clocks -create_base_clocks
  derive_clock_uncertainty
  # 50 MHZ PLL Clock
  create_generated_clock -name clk50 -source [get_ports {OSC_50_BANK2}] \
  [get_nets {*|altpll_component|auto_generated|wire_pll1_clk[0]}]
  # 125 MHZ PLL Clock
  create_generated_clock -name clk125 -multiply_by 5 -divide_by 2 -source \
  [get_ports {OSC_50_BANK2}] \
  [get_nets {*|altpll_component|auto_generated|wire_pll1_clk[1]}]
  # 250 MHZ PLL Clock
  create_generated_clock -name clk250 -multiply_by 5 \ 
  -source [get_ports {OSC_50_BANK2}] [get_nets \
    {*|altpll_component|auto_generated|wire_pll1_clk[2]}]
\end{lstlisting}

Likewise, copy the constraints in
Listing~\ref{Listing:Altera:IPCompiler:QSFSettings} into your \Altera{.qsf}
file. Copy the location assignment commands for each PCIe Pin in your reference
design.
\begin{lstlisting}[language=tcl,basicstyle=\footnotesize\ttfamily,
    commentstyle=\color{red},label=Listing:Altera:IPCompiler:QSFSettings,
    caption=\Altera{.qsf} settings for IP Compiler Designs,frame=single]
  ###################################################################
  # PCIE Connections
  ###################################################################
  set_location_assignment <PCIE_REFCLK_PIN> -to PCIE_REFCLK
  set_instance_assignment -name IO_STANDARD HCSL -to PCIE_REFCLK
  set_location_assignment <PCIE_REFCLK_PIN(n)> -to ``PCIE_REFCLK(n)''
  set_instance_assignment -name IO_STANDARD HCSL -to ``PCIE_REFCLK(n)''
  set_location_assignment <PCIE_RESET_N> -to PCIE_RESET_N
  set_instance_assignment -name IO_STANDARD ``2.5 V'' -to PCIE_RESET_N

  # For each PCIE Lane (L) set the pin locations from the board user guide!
  ######################################################################
  #PCIE TX_OUT L
  ######################################################################
  set_location_assignment <TX_LANE[L]_PIN>   -to PCIE_TX_OUT[L]
  set_location_assignment <TX_LANE[L]_PIN(n)> -to ``PCIE_TX_OUT[L](n)''
  ###################################################################
  #PCIE RX_IN L
  ###################################################################
  set_location_assignment <RX_LANE[L]_PIN>  -to PCIE_RX_IN[L]
  set_location_assignment <RX_LANE[L]_PIN(n)> -to ``PCIE_RX_IN[L](n)''
\end{lstlisting}
\pagebreak
\chapter{Developer Documentation}
This chapter describes RIFFA \RIFFAVer~at a level of detail that is useful for
RIFFA developers. Users of RIFFA should not read this section until they are
comfortable developing for RIFFA or have experience with PCIe and DMA
concepts. 
\section{Architecture Description}
\begin{figure}[H]
\includegraphics[width=350px,center]{RIFFA.pdf}
  \caption{High level RIFFA Diagram}
  \label{Fig:Developer:HighLevel:Full}
\end{figure}
\begin{itemize}
\item \textbf{\color{red}{IP Interfaces}} The Vendor IP interfaces provied
  low-level access to the PCIe bus. Each vendor provides a set of signals for
  communicating over PCIe. Xilinx FPGAs without PCIe Gen3 support provide an
  interface very similar to Altera FPGAs. We call this the ``Classic
  Interface''. Newer Xilinx devices with PCIe Gen3 support have completely
  different non-compatible interfaces (CC, CQ, RC, RQ instead of RX and TX). We
  call this the ``Xilinx Ultrascale Interface''.

  \textit{Files: \Xilinx{*.xci}, \Altera{*.qsys} (And others generated by vendor
    tools)}

\item \textbf{\color{orange}{Translation Layer}} The Translation Layer provides
  a set of vendor-independent interfaces and signal names. 

  There is one translation layer for each interface. The ``Classic Translation
  Layer'' provides a set of interfaces (RX, TX, Interrupt, and Configuration)
  and vendor independent signal names to higher layers. There is very little
  logic in these layers, and there should be no timing-critical logic here.

  The ``Ultrascale Translation Layer'' operates on the ultrascale
  interface. Similar to the classic translation layer, it contains very little
  logic. It provides the interfaces: RX Completion, RX Request, TX
  Completion, TX Request, Interrupt, and Configuration. 

  \textit{Files: translation\_altera.v, translation\_xilinx.v,
    txc\_engine\_ultrascale.v,\\ txr\_engine\_ultrascale.v}

\item \textbf{\color{yellow}{Formatting Engine Layer}} The Formatting Engine
  Layer is responsible for formatting requests and completions into
  packets. This layer provides four interfaces: RX Completion (RXC) for
  receiving completions (responses to memory read requests), RX Request (RXR)
  for receiving memory read and write requests, TX Completion (TXC) for
  transmitting completions (reponses to memory read requests), and TX Request
  (TXR) for transmitting read and write requests.
  
  The engine layer abstracts vendor specific features, such as Xilinx's
  Classic-Interface Big-Endian requirement and Altera's Quad-word
  Alignment. The C\_VENDOR parameter for the engine layer switches between
  Xilinx, Altera, and Ultrascale logic to produce TLPs (Classic Interface) and
  AXI Descriptors (Ultrascale Interface).

  The RX path of the engine layer has packet parsers for TLPs and AXI
  Descriptors. These are parameterized by width, as of RIFFA 2.2. The TX Path of
  the engine layer has packet formatters for TLPs and AXI Descriptors. 

  As alluded to in the Translation Layer, the Classic IP Cores provide only two
  transmit interfaces (RX, and TX), while the Xilinx Ultrascale IP Core handles
  RX Demultiplexing and multiplexing internally and provides four interfaces
  (RXC, RXR, TXC, and TXR). For this reason, the multiplexing/FIFO logic used in
  the Classic interfaces are not necessary for the Xilinx interface.

  After the Engine-Layer, higher layers should be vendor agnostic, if not bus
  agnostic. The exception will be sideband signals signals. (How much of this
  ideal can be achieved remains to be seen)

  Note: The engine layer currently uses word-aligned addresses, and byte-enable
  signals to specify sub-word addresses. In the future, all addresses will be
  byte-aligned and word enables will be handled in the formatting logic. 

  \textit{Files: engine\_layer.v, schedules.vh, rx\_engine\_classic.v, rxc\_engine\_classic.v,\\
    rxr\_engine\_classic.v, tx\_engine\_classic.v, txc\_engine\_classic.v,\\
    txr\_engine\_classic.v, rx\_engine\_ultrascale.v, rxc\_engine\_ultrascale.v,\\
    rxr\_engine\_ultrascale.v, tx\_engine\_ultrascale.v, txc\_engine\_ultrascale.v, \\
    txr\_engine\_ultrascale.v}
   
\item \textbf{\color{green}{Scatter Gather (SG) DMA Layer}} The Scatter Gather
  DMA Layer handles reading from and writing to scatter gather lists and
  providing the addresses found in these lists to the data-request logic in the
  Data Abstraction layer. In RIFFA, each channel has its own SG DMA list logic.

  The Completion Merge/Reorder buffer handles out-of-order completions. In the
  PCIe specification, a memory request can be serviced by multiple smaller
  completions (the responses must remain in order). Completions from different
  memory requests can be returned in any order. The reorder buffer releases data
  when all of the responses to a memory request have been received.

  Memory read and write requests to the host are multiplexed by the TX Request
  Mux. These are serviced fairly in round robin order.
  
  The Scatter Gather List Readers issue read requests to read data from the
  Scatter Gather List (SGL) created by the driver. This list contains the
  address and length of pages containing data to transmit. When an SGL has been
  exhausted, an interrupt is raised and the SGL is refilled or the transaction
  is comlete.

  Each element in the SGL 128-bit triple: {32'b0, 32'b Length of Data in 32-bit
    words, 64'b Address of Page}. The addresses in this list are provided to the
  DMA Data Read Engine in the Data Abstraction layer. Since the SGL must be a
  single continuous stream of 128-bit elements regardless of the size of the
  interface, gaps and mis-alignments due to packet formatting are removed using
  the Data Packer, which receives its data from the reorder buffer.
  
  The location of the SGL in host memory is written to the BAR Memory space.
  The BAR Memory space is partitioned among the channels. Only the host can
  issue read and write requests to this memory space. Since the memory space is
  partitioned, the RX Request interface and TX Completion interface do not have
  demultiplexing or multiplexing logic.

  A more through treatment of the SG DMA Layer can be found in Sec.~\ref{Sec:Developer:Arch:SGDMA}.

  \textit{Files: reorder\_queue*.v, sg\_list\_reader\_*.v, sg\_list\_requester.v \\ 
    fifo\_packer\_*.v, registers.v, tx\_multiplexer\_*.v}


\item \textbf{\color{blue}{Data Abstraction / DMA Layer}} The Data Abstraction /
  DMA Layer is responsible for making requests to read data from, or write data
  to host memory.

  The read and write addresses are provided by the Scatter Gather list
  readers. Since RIFFA provides a single continuous stream of 32-bit words
  regardless of the size of the interface, gaps and mis-alignments due to packet
  formatting are removed using the Data Packer, which receives its data from the
  reorder buffer. On the TX side, this is not necessary. However a write buffer,
  and other transaction tracking logic is necessary for buffering, and removing
  non-integral data.

  A more through treatment of the Data Abstraction Layer can be found in
  Sec.~\ref{Sec:Developer:Arch:DataAbstraction}.

  \textit{Files: reorder\_queue*.v, rx\_port\_*.v, rx\_port\_reader.v,\\
    fifo\_packer\_*.v, tx\_port\_writer.v tx\_port\_buffer\_*.v tx\_port\_monitor\_*.v}

\item \textbf{\color{purple}{Channel Interface}}
  \textit{Files: rx\_channel\_gate\_*.v, tx\_channel\_gate\_*.v}
\contourlength{.2pt}
\item \contour{black}{\textbf{\color{white}{User Logic}}}
\end{itemize}
\subsection{Scatter Gather DMA Layer}
\label{Sec:Developer:Arch:SGDMA}
READS from the SG lists are prioritized
\subsection{Data Abstraction DMA Layer}
\label{Sec:Developer:Arch:DataAbstraction}
\section{Software Description}

\section{FPGA RX Transfer / Host Send}
\begin{center}
  \begin{tabular}{ | l | l |}
    \hline
    Parameter & Value \\ \hline
    Data Transfer Length & 128 (32-bit words)\\ \hline
    Data Transfer Offsfet & 0\\ \hline
    Data Transfer Last & 1\\ \hline
    Data Transfer Channel & 0\\ \hline
    Data Page Address (DMA) & 0x00000000\_FEED0000\\ \hline
    SGL Head Address & 0x00000000\_BEEF0000\\ \hline
  \end{tabular}
\end{center}

\begin{itemize}
 \item A user makes an call to fpga\_send() to transfer 128 32-bit words of data on Channel 0.  \item The RIFFA driver writes \{32'd128\} to Channel 0's RX Length register, and
   \{31'd0,1'b1\} to Channel 0's RX OffLast register. This notifies the FPGA that a new
   transfer is happening and will raise CHNL\_RX for the user
   application. \textit{Files: rxr\_engine\_*.v, registers.v, channel*.v,
     rx\_port.v rx\_port\_gate.v, rx\_port\_reader.v}
 \item The RIFFA driver allocates an SGL with 1 element (4 32-bit words) at
   address \\\{64'h0000\_0000\_BEEF\_0000\}. The driver fills the list with the
   length and address of the user data:\\
    \{32'd0,32'd128,64'h0000\_0000\_FEED\_0000\}.
 \item The RIFFA driver communicates the address and length of the SGL by
   writing \\\{32'hBEEF0000\} to to Channel 0's RX SGL Address Low register,
   \{32'd0\} to to Channel 0's RX SGL Address High register, and \{32'd4\} to to
   Channel 0's RX SGL Length register. Writing the RX SGL Length register
   notifies the RX SG Engine that a transfer has started, and the low and high
   portions of the 64-bit RX SGL Address are valid. \textit{Files:
     rxr\_engine\_*.v, registers.v, channel*.v, rx\_port.v,
     sg\_list\_requester.v}
 \item The SG List Requester on the FPGA issues a read request for 4 32-bit
   words of data starting at address 0xBEEF0000. The FPGA also issues an
   interrupt. The RIFFA driver reads the Interrupt Status Register of the FPGA
   and determines that Channel 0 has finished reading the RX SGL. \textit{Files:
     sg\_list\_requester.v, rx\_port\_requester\_mux.v, rx\_port\_*.v,
     channel*.v, tx\_multiplexer.v, engine\_layer.v, txr\_engine\_*.v,
     interrupt.v}
 \item The FPGA receieves a completion with 4 32-bit words. After being enqueued
   in the reorder buffer, the completion is delivered to Channel 0, and packed
   into the SGL RX Fifo. \textit{Files: rxc\_engine\_*.v, engine\_layer.v,
     reorder\_queue*.v, fifo\_packer\_*.v}
 \item The RX Port Reader removes the SG element from the FIFO, and issues
   several read requests to receive all 128 32-bit words. \textit{Files:
     rx\_port\_reader.v, rx\_port\_*.v, channel*.v, tx\_multiplexer.v,
     engine\_layer.v, txr\_engine\_*.v, tx\_multiplexer.v}
 \item The completions return interleaved and are reordered in the reorder
   buffer. The reorder buffer releases the completions in order to the fifo
   packer, which puts them in the FIFO. The RX Port Channel Gate issues the data
   to the user. \textit{Files: rxc\_engine\_*.v, engine\_layer.v, reorder\_queue*.v,
     fifo\_packer\_*.v, rx\_port\_reader.v, rx\_port\_channel\_gate.v, channel*.v}
\item The FPGA raises an interrupt with the last word of data is put into the
  Main Data Fifo. The RIFFA driver reads the Interrupt Status Register of the
  FPGA and determines that Channel 0 has finished the RX Transaction. The RIFFA
  driver reads the RX Words Read register to determine how many words were read
  during the transaction.
\item Control is returned to the user.
\end{itemize}
\section{TX Transfer}
\section{FPGA RX Transfer / Host Send}
\begin{center}
  \begin{tabular}{ | l | l |}
    \hline
    Parameter & Value \\ \hline
    Data Transfer Length & 128 (32-bit words)\\ \hline
    Data Transfer Offsfet & 0\\ \hline
    Data Transfer Last & 1\\ \hline
    Data Transfer Channel & 0\\ \hline
    Data Page Address (DMA) & 0x00000000\_FEED0000\\ \hline
    SGL Head Address & 0x00000000\_BEEF0000\\ \hline
  \end{tabular}
\end{center}
\begin{itemize}
 \item A user makes an call to fpga\_recv() to transfer 128 32-bit words of data
   from Channel 0.
 \item The RIFFA driver allocates an SGL with 1 element (4 32-bit words) at
   address\\ \{64'h0000\_0000\_BEEF\_0000\}. The driver fills the list with the
   length and address of the user data:
    \{32'd0,32'd128,64'h0000\_0000\_FEED\_0000\}.
  \item The user application independently raises CHNL\_TX and starts writing data to\\
    CHNL\_TX\_DATA. RIFFA core logic reads transaction parameters from
    CHNL\_TX\_OFF, CHNL\_TX\_LAST, and CHNL\_TX\_LEN and acknowledges them with
    CHNL\_TX\_ACK. \\\textit{Files: tx\_port\_channel\_gate.v}
  \item An interrupt is raised by the FPGA. The RIFFA driver reads the Interrupt
    Status Register of the FPGA and determines that Channel 0 wishes to start a
    new TX Transaction. The driver ISR reads \{32'd128\} from Channel 0's TX
    Length register, and \{31'd0, 1'b1\} from Channel 0's TX OffLast
    register. Reading the OffLast register notifies the FPGA that the new
    transfer has been accepted. \textit{Files: rxr\_engine\_*.v, riffa.v,
      registers.v, channel*.v, tx\_port\_*.v, tx\_port\_writer.v,
      tx\_port\_monitor\_*.v, engine\_layer.v, txc\_engine\_*.v}
 \item The RIFFA driver communicates the address and length of the SGL by
   writing\\ \{32'hBEEF\_0000\} to to Channel 0's TX SGL Address Low register,
   \{32'd0\} to to Channel 0's TX SGL Address High register, and \{32'd4\} to to
   Channel 0's TX SGL Length register. Writing the TX SGL Length register
   notifies the TX SG Engine that a transfer has started, and the low and high
   portions of the 64-bit TX SGL Address are valid. \textit{Files:
     rxr\_engine\_*.v, registers.v, channel*.v, rx\_port.v,
     sg\_list\_requester.v}
 \item The SG List Requester on the FPGA issues a read request for 4 32-bit
   words of data starting at address 0xBEEF0000. The FPGA raises an
   interrupt. The RIFFA driver reads the Interrupt Status Register of the FPGA
   and determines that Channel 0 has finished reading the TX SGL. \textit{Files:
     sg\_list\_requester.v, rx\_port\_requester\_mux.v, rx\_port\_*.v,
     channel*.v, tx\_multiplexer.v, engine\_layer.v, txr\_engine\_*.v,
     interrupt.v}
 \item The FPGA receieves a completion with 4 32-bit words. After being enqueued
   in the reorder buffer, the completion is delivered to Channel 0, and packed
   into the SGL TX Fifo. \textit{Files: rxc\_engine\_*.v, engine\_layer.v,
     reorder\_queue*.v, fifo\_packer\_*.v}
 \item The TX Port Writer removes the SG element from the FIFO, and issues
   several write requests to write all 128 32-bit words. \textit{Files:
     tx\_port\_monitor.v, tx\_port\_writer.v, tx\_port\_*.v, channel*.v, tx\_multiplexer.v,
     engine\_layer.v, txr\_engine\_*.v, tx\_multiplexer.v}
 \item When the last write transaction has been accepted by the core, the FPGA
   raises an interrupt. The RIFFA driver reads the Interrupt Status Register of
   the FPGA and determines that Channel 0 has finished writing data. The RIFFA
   driver reads the TX Words Written register to determine how many words were
   written during the transaction (in case of early termination, or overflow).
   \textit{Files: rxr\_engine\_*.v, riffa.v, interrupt.v, registers.v, channel*.v,
     tx\_port\_*.v, tx\_port\_writer.v, engine\_layer.v, txc\_engine\_*.v}
 \item Control is return to the user because the TX\_LAST signal was set to 1.
\end{itemize}
%\pagebreak
%\chapter{The RIFFA Architecture}
%\section{Translation Layer}
%\section{Engine Layer}
%\section{DMA Layer}
%\section{RIFFA}
%\section{The DMA Engine Layer}
%\chapter{Version Notes}
%\section{References, Notes and Reading Material}
%\section{Core Diagrams}
\end{document}
